Docker & Containerization Guide – OpenSource Tech Guides

1. Core Containerization Concepts

Understanding the foundations: containers vs VMs, images, layers, and the Docker ecosystem.

Containers vs Virtual Machines

Containers share the host OS kernel, making them lightweight and fast to start. VMs include full OS, requiring more resources but providing stronger isolation.

Container: Process-level isolation, shared kernel
VM: Hardware-level virtualization, full OS
Startup: Containers (seconds) vs VMs (minutes)
Resource usage: Containers 10-100MB vs VMs 1-10GB+

Architecture

Images and Layers

Docker images are built from read-only layers. Each Dockerfile instruction creates a new layer, enabling efficient storage and distribution.

Base layer: Usually OS (alpine, ubuntu, scratch)
Application layers: Dependencies, code, configuration
Layer caching: Unchanged layers reused across builds
Registry: Docker Hub, ECR, GCR for image storage

Storage

Container Runtime

Docker Engine manages container lifecycle: creation, execution, networking, and resource allocation through containerd and runc.

Docker Daemon: Background service managing containers
containerd: High-level runtime handling images
runc: Low-level runtime executing containers
OCI Standard: Open Container Initiative specifications

Runtime

Networking Fundamentals

Docker provides multiple networking modes for container communication: bridge, host, overlay, and custom networks.

Bridge: Default isolated network with port mapping
Host: Container uses host network directly
Overlay: Multi-host networking for swarm clusters
Custom networks: User-defined bridges with DNS

Networking

Data Persistence

Containers are ephemeral by design. Use volumes and bind mounts for persistent data storage across container lifecycles.

Volumes: Docker-managed storage, preferred method
Bind mounts: Host filesystem paths mounted in container
tmpfs: In-memory storage for temporary data
Named volumes: Shareable between containers

Storage

Container Lifecycle

Understanding states: created, running, paused, stopped, and the commands to manage transitions between these states.

Created: Container exists but not started
Running: Active container with running processes
Paused: Processes frozen, memory preserved
Stopped: Clean shutdown, exit code recorded

Lifecycle

Docker Architecture Overview

┌─────────────────────────────────────────────────────────┐ │ Docker Client │ │ docker build | docker run | docker push | docker pull │ └─────────────────────┬───────────────────────────────────┘ │ REST API ┌─────────────────────▼───────────────────────────────────┐ │ Docker Daemon │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐ │ │ │ Images │ │ Containers │ │ Networks │ │ │ └─────────────┘ └─────────────┘ └─────────────────┘ │ │ ┌─────────────┐ ┌─────────────┐ ┌─────────────────┐ │ │ │ Volumes │ │ Registry │ │ Plugins │ │ │ └─────────────┘ └─────────────┘ └─────────────────┘ │ └─────────────────┬───────────────────────────────────────┘ │ ┌─────────────────▼───────────────────────────────────────┐ │ Container Runtime │ │ containerd → runc → Linux Kernel │ └─────────────────────────────────────────────────────────┘

Essential Commands

# Image operations
docker build -t myapp:latest .
docker pull ubuntu:20.04
docker images
docker rmi image_id

# Container lifecycle
docker run -d --name web -p 80:8080 nginx
docker ps -a
docker exec -it web /bin/bash
docker logs web
docker stop web
docker rm web

# Network and volumes
docker network create mynet
docker volume create mydata
docker run -v mydata:/data ubuntu

2. Dockerfile Mastery & Multi-Stage Builds

Crafting efficient, secure, and maintainable container images with advanced Dockerfile techniques.

Dockerfile Instructions

FROM: Base image selection
RUN: Execute commands during build
COPY: Add files from build context
ADD: COPY with URL/tar extraction
WORKDIR: Set working directory
ENV: Set environment variables
EXPOSE: Document port usage
CMD: Default command to run
ENTRYPOINT: Immutable startup command
USER: Set execution user
HEALTHCHECK: Container health monitoring

Layer Optimization

Combine RUN commands with && to reduce layers
Clean up package caches in same RUN instruction
Order instructions by change frequency
Use .dockerignore to exclude unnecessary files
Leverage build cache effectively
Use specific tags instead of 'latest'

Multi-Stage Build Example

# Build stage
FROM node:16-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci --only=production && npm cache clean --force

COPY . .
RUN npm run build

# Production stage
FROM node:16-alpine AS production
RUN addgroup -g 1001 -S nodejs && \
    adduser -S nextjs -u 1001

WORKDIR /app
COPY --from=builder --chown=nextjs:nodejs /app/dist ./dist
COPY --from=builder /app/node_modules ./node_modules
COPY package.json ./

USER nextjs
EXPOSE 3000
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
  CMD curl -f http://localhost:3000/health || exit 1

CMD ["node", "dist/server.js"]

Advanced Dockerfile Patterns

Security Hardening

Use minimal base images (alpine, distroless)
Run as non-root user
Scan images for vulnerabilities
Use specific package versions
Remove unnecessary tools and packages

Build Arguments & Secrets

ARG for build-time variables
ENV for runtime environment
BuildKit secrets for sensitive data
Multi-platform builds with buildx

3. Docker Compose & Multi-Container Applications

Orchestrating complex applications with multiple services, databases, and networking.

Compose File Structure

version: Compose file format version
services: Application components definition
networks: Custom network configuration
volumes: Named volume definitions
configs: Configuration file management
secrets: Sensitive data handling

Service Configuration

image/build: Container source
ports: Host to container mapping
volumes: Data persistence
environment: Runtime variables
depends_on: Service dependencies
restart: Restart policy
healthcheck: Service monitoring

Production-Ready Compose Example

version: '3.8'

services:
  web:
    build:
      context: .
      dockerfile: Dockerfile.prod
      args:
        NODE_ENV: production
    ports:
      - "80:3000"
    environment:
      - DATABASE_URL=postgresql://user:pass@db:5432/myapp
      - REDIS_URL=redis://redis:6379
    volumes:
      - ./uploads:/app/uploads
    depends_on:
      db:
        condition: service_healthy
      redis:
        condition: service_started
    restart: unless-stopped
    healthcheck:
      test: ["CMD", "curl", "-f", "http://localhost:3000/health"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 40s

  db:
    image: postgres:14-alpine
    environment:
      POSTGRES_DB: myapp
      POSTGRES_USER: user
      POSTGRES_PASSWORD_FILE: /run/secrets/db_password
    volumes:
      - postgres_data:/var/lib/postgresql/data
      - ./init.sql:/docker-entrypoint-initdb.d/init.sql:ro
    secrets:
      - db_password
    healthcheck:
      test: ["CMD-SHELL", "pg_isready -U user -d myapp"]
      interval: 10s
      timeout: 5s
      retries: 5

  redis:
    image: redis:7-alpine
    command: redis-server --appendonly yes
    volumes:
      - redis_data:/data
    restart: unless-stopped

  nginx:
    image: nginx:alpine
    ports:
      - "443:443"
    volumes:
      - ./nginx.conf:/etc/nginx/nginx.conf:ro
      - ./ssl:/etc/ssl:ro
    depends_on:
      - web
    restart: unless-stopped

volumes:
  postgres_data:
    driver: local
  redis_data:
    driver: local

secrets:
  db_password:
    file: ./secrets/db_password.txt

networks:
  default:
    driver: bridge
    ipam:
      config:
        - subnet: 172.20.0.0/16

Compose Commands

# Development workflow
docker-compose up -d
docker-compose logs -f web
docker-compose exec web npm test
docker-compose down

# Production deployment
docker-compose -f docker-compose.prod.yml up -d
docker-compose -f docker-compose.prod.yml pull
docker-compose -f docker-compose.prod.yml restart web

# Scaling services
docker-compose up -d --scale web=3
docker-compose ps

4. Container Orchestration & Kubernetes

Scaling beyond single hosts with orchestration platforms, focusing on Kubernetes fundamentals.

Kubernetes Architecture

Master-node architecture with control plane managing worker nodes running containerized workloads.

Master: API Server, etcd, Controller Manager, Scheduler
Nodes: kubelet, kube-proxy, Container Runtime
Pods: Smallest deployable units, one or more containers
Services: Network abstraction for pod communication

K8s

Core Resources

Essential Kubernetes objects for application deployment and management.

Deployment: Manages replica sets and rolling updates
Service: Load balancing and service discovery
ConfigMap: Configuration data separation
Secret: Sensitive information management
Ingress: HTTP/HTTPS routing rules
PersistentVolume: Storage abstraction

Resources

Docker Swarm

Docker's native orchestration solution for simpler cluster management.

Swarm mode: Built into Docker Engine
Services: Declarative service definitions
Stacks: Multi-service applications
Secrets/Configs: Centralized management
Rolling updates: Zero-downtime deployments

Swarm

Kubernetes Deployment Example

apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
  labels:
    app: web-app
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1
      maxSurge: 1
  selector:
    matchLabels:
      app: web-app
  template:
    metadata:
      labels:
        app: web-app
    spec:
      containers:
      - name: web
        image: myapp:v1.2.0
        ports:
        - containerPort: 3000
        env:
        - name: DATABASE_URL
          valueFrom:
            secretKeyRef:
              name: app-secrets
              key: database-url
        resources:
          requests:
            memory: "128Mi"
            cpu: "100m"
          limits:
            memory: "256Mi"
            cpu: "200m"
        livenessProbe:
          httpGet:
            path: /health
            port: 3000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 3000
          initialDelaySeconds: 5
          periodSeconds: 5

---
apiVersion: v1
kind: Service
metadata:
  name: web-app-service
spec:
  selector:
    app: web-app
  ports:
  - protocol: TCP
    port: 80
    targetPort: 3000
  type: LoadBalancer

5. Container Security & Best Practices

Securing containers throughout the development lifecycle and in production environments.

Image Security

Use official or verified base images
Keep images updated with security patches
Scan images for vulnerabilities (Trivy, Snyk)
Use distroless or minimal base images
Sign images with Docker Content Trust
Implement image admission controllers

Runtime Security

Run containers as non-root users
Use read-only filesystems where possible
Drop unnecessary Linux capabilities
Implement resource limits (CPU, memory)
Use security contexts and Pod Security Standards
Network segmentation and policies

Secrets Management

Never embed secrets in images
Use Docker secrets or Kubernetes secrets
Rotate secrets regularly
Use external secret management (Vault, AWS Secrets)
Encrypt secrets at rest and in transit
Audit secret access

Network Security

Use custom networks instead of default bridge
Implement network policies for pod communication
Use TLS for all inter-service communication
Restrict ingress/egress traffic
Use service mesh for advanced traffic management
Monitor network traffic patterns

Security Scanning Example

# Vulnerability scanning with Trivy
trivy image myapp:latest

# Docker Bench Security audit
docker run --rm --net host --pid host --userns host --cap-add audit_control \
    -e DOCKER_CONTENT_TRUST=$DOCKER_CONTENT_TRUST \
    -v /etc:/etc:ro \
    -v /usr/bin/containerd:/usr/bin/containerd:ro \
    -v /usr/bin/runc:/usr/bin/runc:ro \
    -v /usr/lib/systemd:/usr/lib/systemd:ro \
    -v /var/lib:/var/lib:ro \
    -v /var/run/docker.sock:/var/run/docker.sock:ro \
    --label docker_bench_security \
    docker/docker-bench-security

# Secure Dockerfile example
FROM node:16-alpine
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
WORKDIR /app
COPY --chown=appuser:appgroup . .
USER appuser
RUN npm ci --only=production
EXPOSE 3000
CMD ["node", "server.js"]

7. Production Best Practices

Enterprise-grade patterns for scalable, maintainable containerized applications.

Performance Optimization

Use multi-stage builds to reduce image size
Implement proper caching strategies
Optimize layer ordering for build cache
Use .dockerignore effectively
Choose appropriate base images
Configure resource limits and requests
Use health checks for reliability

Monitoring & Logging

Centralized logging with structured formats
Container metrics collection (Prometheus)
Distributed tracing for microservices
Application performance monitoring
Log aggregation and analysis
Alert on container failures and resource usage

CI/CD Integration

Automated image builds on code changes
Security scanning in build pipeline
Semantic versioning for images
Blue-green deployments
Canary releases for risk mitigation
Rollback mechanisms

Disaster Recovery

Regular backups of persistent data
Multi-region deployments
Database replication strategies
Automated failover mechanisms
Recovery time objectives (RTO) planning
Regular disaster recovery testing

Docker & Containerization Mastery